Primitives-based evaluation and estimation of emotions in speech
نویسندگان
چکیده
Emotion primitive descriptions are an important alternative to classical emotion categories for describing a human’s affective expressions. We build a multi-dimensional emotion space composed of the emotion primitives of valence, activation, and dominance. In this study, an image-based, text-free evaluation system is presented that provides intuitive assessment of these emotion primitives, and yields high inter-evaluator agreement. An automatic system for estimating the emotion primitives is introduced. We use a fuzzy logic estimator and a rule base derived from acoustic features in speech such as pitch, energy, speaking rate and spectral characteristics. The approach is tested on two databases. The first database consists of 680 sentences of 3 speakers containing acted emotions in the categories happy, angry, neutral, and sad. The second database contains more than 1000 utterances of 47 speakers with authentic emotion expressions recorded from a television talk show. The estimation results are compared to the human evaluation as a reference, and are moderately to highly correlated (0.42 < r < 0.85). Different scenarios are tested: acted vs. authentic emotions, speaker-dependent vs. speaker-independent emotion estimation, and gender-dependent vs. gender-independent emotion estimation. Finally, continuous-valued estimates of the emotion primitives are mapped into the given emotion categories using a k-nearest neighbor classifier. An overall recognition rate of up to 83.5% is accomplished. The errors of the direct emotion estimation are compared to the confusion matrices of the classification from primitives. As a conclusion to this continuous-valued emotion primitives framework, speaker-dependent modeling of emotion expression is proposed since the emotion primitives are particularly suited for capturing dynamics and intrinsic variations in emotion expression. 2007 Elsevier B.V. All rights reserved.
منابع مشابه
Acoustic Emotion Recognition in Car Environment Using a 3D Emotion Space Approach
Introduction The automatic assessment of emotions conveyed in the speech signal has become a rapidly growing research interest in recent years. This paper focuses on a generalized framework to estimate emotions from the speech using an emotion space concept. The performance of such a system is studied in the acoustically demanding environment of vehicular noise while driving. Due to the increas...
متن کاملSpeaker and Listener Variations in Emotion Assessment
Introduction In this paper we discuss both the speaker dependent and the listener dependent aspects in the assessment of emotions in speech. These dependencies form a basis to improve current emotion recognition systems as they can be applied in man-machine interaction, for instance. Emotion recognition in speech has gained much attention in recent years [1, 2, 3]. However, human evaluation of ...
متن کاملModeling Emotion Expression and Perception Behavior in Auditive Emotion Evaluation
In this paper, we consider both speaker dependent and listener dependent aspects in the assessment of emotions in speech. We model the speaker dependencies in emotional speech production by two parameters which describe the individual’s emotional expression behavior. Similarly, we model the listener’s emotion perception behavior by a simple parametric model. These models form a basis for improv...
متن کاملMultilingual Speech Emotion Recognition System Based on a Three-Layer Model
Speech Emotion Recognition (SER) systems currently are focusing on classifying emotions on each single language. Since optimal acoustic sets are strongly language dependent, to achieve a generalized SER system working for multiple languages, issues of selection of common features and retraining are still challenging. In this paper, we therefore present a SER system in a multilingual scenario fr...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 49 شماره
صفحات -
تاریخ انتشار 2007